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A  HIERARCHICAL  APPROACH  TO  INFORMATION  SYSTEM  DESIGN 

BY 
John  Donovan  and  Henry  Jacoby* 

ABSTRACT 

This  paper  presents  an  approach  to  the  development  of  management 
information  systems  that  is  particularly  applicable  to  systems  with  the 
following  characteristics: 

-  several  classes  of  users,  each  of  which  has  a  different 
degree  of  sophistication 

-  complex  and  changing  security  requirements 

-  data  exhibits  complex  and  changing  inter-relationships 

-  needs  of  the  information  system  changing 

-  must  be  built  quickly  and  inexpensively 

-  complex  data  validation  requirements 

The  approach  is  hierarchical  in  that  the  functional  tasks  of  the  system 

are  grouped  and  ordered  such  that  each  group  depends  only  on  functions  of 

the  group  beneath  it.  Each  group  is  called  a  level.  We  maintain  that  not 

facil itate 
only  does  this  approach   /   the  implementation  of  management  information 

systems  to  fulfill  the  above  needs  but  provides  a  sound  theoretical  basis 

for  investigating  properties  of  completeness,  integrity,  correctness,  and 

performance.  Some  of  these  theoretical  approaches  are  also  presented.  We 

practical  way  of 
also  maintain  that  such  hierarchical  approaches,  in  general,  provide  a  / 

decomposing  complex  system  designs  into  manageable  implementation  schemes. 

We  feel  that  some  of  the  primitives  of  the  levels  described  here  will 

eventually  be  placed  into  the  hardware  architecture  of  machines.  We  have 

applied  this  approach  to  the  development  of  information  systems  for  public 

policy  decisions  regarding  energy  for  the  states.  New  England  Energy 

Management  Information  System  (NEEMIS). 
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PART  1 

1 .  Structure  of  Paper 

The  paper  is  divided  into  two  parts.  The  first  exposes  the  key 
problems  in  representing  data  in  a  computer  system  and  gives  an  overview  of 
the  hierarchical  approach.  The  second  gives  detailed  discussion  of  each 
level  of  a  hierarchy  for  information  systems. 

The  levels  we  present  are:  bare  machine,  operating  systems,  store 
and  retrieval  of  relations,  relational  operators,  security  and  validation, 
a  Data  Definition  Language  (DDL)  and  a  Data  Manipulation  Language  (DML), 
report  generator  and  user  interface,  modeling  facility.  We  formalize  some 
levels,  give  the  present  state  of  research  at  MIT  of  each  level,  and  present 
future  theorems  and  potential  fruitful  research  directions  of  each  level. 

The  store  and  retrieval  relations  are  further  divided  into  multiple  sub- 
levels. 

2.  Characterization  of  Data  Problems 

The  basis  of  an  information  system  is  data.  Let  us  address  ourselves 
to  the  problems  of  data  complexity  and  complications.  Figure  1  depicts  an 
example  of  two  data  series  typical  of  those  that  economists  are 
accustomed  to  handling.  One  could  perform  all  sorts  of  statistical 
operations  on  the  inventory  series  --  regressions,  averages,  standard 
deviations,  etc. 

However,  for  policymaking  it  is  important  to  store  more  complex 
information,  such  as  the  relations  between  data.  For  example.  Figure  2 
depicts  four  data  items:  the  names  of  terminals,  their  address,  region 
number,  and  inventory  value  of  different  types  of  fuel.  Let  us  further 


complicate  Figure  2  by  representing  data  in  relation  to  the  owners,  sup- 
pliers, and  all  terminals  in  region  8.  Figure  3  depicts  these  data  items 
and  their  relationships.  Now  visualize  what  such  a  diagram  would  look 
like  for  all  terminals  in  the  U.S.  and  all  possible  relationships  -- 
a  mess!  The  basic  problems  are: 

-  How  can  such  information  be  represented  logically? 

-  How  should  an  implementor  view  such  data? 

-  What  operators  should  exist  to  manipulate  such  data? 

-  What  mechanisms  should  be  avail c^ble  to  validate  it? 

-  How  can  its  protection  be  ensured? 

-  All  of  the  above  within  the  constraints  of: 

-  good  performance  (low  operating  cost  of  system) 

-  recognition  of  the  fact  that  all  the  relation- 
ships might  change;  the  types  of  data  series 
available  might  change 
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Figure  2 
Terminal  Data 
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Complex  Data 
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3.  Characteristics  of  Management  Information  Systems  Problems 

What  is  so  complicated  about  an  information  system?  The  following 
lists  some  of  those  problems. 

-  Representation  of  data 

-  Storage 

-  Retrieval 

-  Manipulation  of  Data 

-  Use  as  Base  for  Models  for  Public  Policy 

-  Satisfy  different  degree  of  sophistication  of  users 

All  of  the  above  problems  msut  be  addressed  within  the  constraints  of: 

-  low  cost 

-  good  performance 

-  users  will  want  changes  as  the  evolution  of  the  user  and 
needs  of  a  system  change 

-  levels  of  users 

4.  Hierarchical  Approach 

The  basic  idea  behind  this  approach  is  to  divide  a  problem  into  groups 
of  functions  (levels).  Each  group  is  ordered  such  that  it  depends  only  on 
the  group  below  it.  Let  us  take  a  simple  example  to  develop  an  intuitive 
feel  for  this  approach. 

Suppose  a  carpenter  wishes  to  build  a  house.  One  view  he  may  have  of 
his  basic  components  would  be  as  is  depicted  in  Figure  4.  That  is,  he 
views  all  his  basic  components  as  wood,  nails,  putty,  glass,  etc. 
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A  hierarchical  approach  is  depicted  in  Figure  5  where  the  carpenter 
views  his  components  in  levels.  The  second  level  windows  are  composed  of 
components  only  from  the  levels  below.  Doors  may  be  composed  of  windows 
and  any  of  the  components  of  the  inner  level.  This  carpenter  has  simpli- 
fied the  construction  task  and  also  may  have  simplified  the  "debugging" 
task.  Namely,  if  a  door  is  not  operating  correctly  and  if  the  levels  below 
it  are  debugged,  then  the  fault  must  be  in  the  door.  (Note:  recursion 
is  not  allowed.  Windows  cannot  have  door  assemblers  in  them). 

Data  Hierarchical  Model 


The  motivation  here  is  to  choose  a  logical  representation  of  data 
that  is  divorced  from  the  physical  implementation. 
Historical : 

File  blocking  is  an  example  of  an  early  technique  for  giving  the 
user  of  a  computer  a  different  view  of  data  than  what  appears  in  the  physical 
implementation.  (An  example  of  blocking  is  having  several  logical  records 
placed  into  one  physical  record  on  a  tape.) 

The  paramount  advantage  of  giving  a  user  a  logical  representation 
that  is  independent  and  separate  from  the  physical  implementation  is  that 
the  physical  representation  may  be  changed  without  affecting  the  applica- 
tion prograimer.  This  separation  is  an  example  of  a  two-level  hierarchy. 

of 
One  of  the  first  explanations/this  hierarchical  concept  and  its  application 


to  file  systems  appears  in  [Madnick,  1970],  in  SYSTEMS  PROGRAMMING 
[Donovan,  1972]  and  later  in  OPERATING  SYSTEMS  [Madnick  &  Donovan,  1974]. 
The  most  notable  early  expositions  of  this  hierarchical  concept  appears  in 
"T.H.E.  Multiprogramming  System",  [Dijkstra,  1968]. 

Extending  this  two-level  heirarchy  we  find  the  hierarchy  depicted  in 
Figure  6  is  applicable  to  the  design,  implementation,  and  study  of  information 
systems. 

A  user  viewing  the  bare  machine  sees  instructions  like  "load,  add, 

store,  multiply".  A  user  viewing  level  2  sees  instructions  like  "Get  more 

memory",  "Give  me  a  device".  A  user  at  level  3  may  store  and  retrieve 

tables.  A  user  at  the  operator  level  has  three  operators  on  tables.  For 

example,  an  operator  would  be  "Find  all  common  entries  between  tables". 

Auser  at  the  data  security  level  can  only  access  tables  under  prescribed 

rules.  A  user  at  the  DML  and  DDL  level  can  use  a  cryptic  English  for 

query  information  stored  in  tables  or  use  the  DDL  to  define  new  tables. 

A  user  at  level  8  may  activate  packages.  A  user  at  level  9  may  create  and 

activate  models. 

Note  that  each  level  may  be  viewed  as  the  user  of  the  level  below 

it,  as  it  uses  the  primitives  of  levels  below  it.  For  example,  a  request 

at  level  8  to  "Select  all  terminals  in  Lynn,  Mass.  of  over  50,000  gallons 

of  ^^   fuel  would  activate  the  data  security  level  to  check  the  access  rights 

of  such  a  request.   "ihc  data  security  level  would  uso  the  oi^r^tor:  cf 

level  1  on  secu-irty  tabic:,  ■.•.•i.ici,  cor.tair.  i:-.for!.:a- ion  as  i,o  -lie  proLccLicn 

terminal  table  to 
rights  on  the  terminal  table.  These  same  operators  will  be  used  on  the  / 
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Figure  4 
Non-hierarchical  View  from  Carpenter 
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Figure  5 
Hierarchical  View 
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get  the  desired  information.  The  request  is  carried  further  down  the 
hierarchy  until  the  actual  machine  instructions  and  I/O  cormiands  of  the 
bare  machine  are  initiated  to  satisfy  the  request. 
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Figure  6 
Hierarchical  Implementation 
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PART  II 


The  basic  idea  behind  a  hierarchical  implementation  is  that  each  level 
consists  of  algorithms  that  depend  only  on  (call)  algorithms  in  levels  be- 
neath it.  No  level  is  allowed  to  call  itself,  therefore,  recursion  on  a 
level  is  not  allowed. 

1.  Level  1:  Machine  Instruction. 

Level  1  presents  the  user  with  machine  instruction. 

2.  Level  2:  The  Operating  System. 

A  user  of  this  level  sees  instructions  such  as  "Give  me  a  device", 
"Give  me  more  memory".  The  operating  system  is  a  resource  manager  in  that 
it  manages  the  resources  of  the  computer,  memory,  CPU  time,  and  devices. 
We  further  divide  the  operating  system  into  sublevels  [OPERATING  SYSTEMS, 
Madnick  &  Donovan,  1974].  For  this  paper  we  will  assume  that  either  an 
operating  system  exists  or  the  reader  will  refer  to  the  references  if  he 
should  have  to  build  one. 

3.  Level  3:  File  Systems  (Table  Facil i ty ) 

The  function  of  this  level  is  to  store  and  retrieve  by  symbolic  name 

tables.  A  user  at  this  level  sees  instructions  such  as  CREATE  TABLE,  READ 

standard 
TABLE,  WRITE  TABLE.  Some   /   file  systems,  for  example,  IMS,  MULTICS, 

VSAM,  etc.,  have  such  a  facility  (simply  by  equating  a  table  to  a  file). 

To  build  one  from  start,  we  advocate  that  this  level  be  subdivided  into 

the  sublevels  of  Figure  7. 

Let  us  take  an  example  and  briefly  describe  the  functions  of  each 

sublevel. 

READ   TABLE   TERMINAL 
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Request 

i 

Symbolic  File  System  module  (SFS) 

i      


Basic  File  System  module  (BFS) 


Access  Control  Verification  module  (ACV) 


Logical  File  System  module  (LFS) 
[Access  methods,  file  structure] 


I 


Physical  File  System  module  (PFS) 
[File  organization  strategy] 

^   (if  write)    ^^ 


Allocation  Strategy  module 


Device  Strategy  module 


Device 
Management 


Initiate  I/O 


^  Operating 
System 


Device  Handler 


Figure  7 
Hierarchical  Model  of  a  File  System 
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The  Symbolic  File  System's  function  is  to  map  the  symbolic  reference 

into  a  unqiue  identifier.  This  sublevel   may    read  a  master  directory 

to  find  the  unique  id  of  'TERMINAL'.  If  recursion  were  allowed,  the  SFS 

(i.e.,  find  the  unique  id  of  the  master  directory) 
level  would  call  itself  to  READ  master  directory/.  Since  recursion  is  not 

allowed,  SFS  cannot  call  itself  to  find  the  unique  id  of  the  master  direc- 
tory. Thus  the  master  directory's  unique  id  must  be  known  by  the  SFS,  and 
to  read  this  directory,  the  level  below  is  called  passing  the  unique  id. 

We  call  the  process  of  identifiying  what  each  level  must  know  to  pre- 
vent recursion  "unwinding  the  recursion". 

When  the  Basic  File  System  is  called,  it  is  given  the  unique  ID  of  the 
table  requested.  The  function  of  the  BFS  is  to  find  all  the  information 
about  the  table  (e.g.,  its  size,  location,  access,  etc.). 

The  function  of  access  control  verification  is  to  provide  the  mechanism 
for  allowing  sharing  of  tables  by  multiple  users  with  different  access 
rights. 

After  access  is  checked,  then  the  Logical  File  System  is  called.  The 
LFS  is  concerned  with  mapping  the  structure  of  the  logical  records  onto  the 

linear  byte  string  view  of  a  file. 

The  primary  function  of  the  Physical  File  System  is  to  perform  the 

mapping  of  the  Logical  Byte  Address  into  Physical  Block  Addresses.  For 
example,  the  table  to  be  read  might  be  physically  scattered  in  several 
different  parts  of  a  disk.  The  LFS  might  calculate  its  logical  address  as 
logical  bytes  4000  to  5000  where  the  PFS  would  compute  the  specific  track 
and  cylinder  numbers  for  each  component. 

The  Allocation  Strategy  Module  is  only  activated  on  a  write  (or  create) 
request  when  more  space  is  needed  for  a  table  or  a  new  table  is  created. 
This  module  finds  the  space. 
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The  Device  Strategy  Module,  I/O  initiator,  and  device  handler  are 
actually  levels  of  the  operating  system.  They  create  channel  programs, 
request  I/O,  schedule  the  I/O,  and  handle  interrupts. 
4.  Level  4:  Relational  Operator  Level 

This  hierarchical  approach  for  the  design  of  an  information  system 
does  not  require  that  all  of  the  levels  below  a  certain  point  be  imple- 
mented in  a  hierarchical  manner,  though  we  believe  that  even  for  the 
low  levels  just  discussed  this  approach  is  best. 

At  level  4  we  have  implemented  the  thirteen  operators  on  tables 
(some  times  called  relations).  The  operators  are:  cartesian  product, 
union,  intersection,  projection,  diadic  restriction,  monadic  restriction, 
join,  composition,  permutation,  compuation,  difference,  inversion,  and 
ordering. 

^.1  Relational  History 

Relational  representation  can  be  looked  upon  as  variants  of  Post- 
canonical  systems  [Post:  1943],  Church's  logical  systems  [Church:  1941], 
or  somewhat  more  recently,  Smullyan's  elementary  formal  systems  [Smullyan: 
1961],  and  most  recently,  Donovan's  canonic  systems  [Donovan:  1967]. 

Codd,  however,  is  recognized  as  the  first  to  apply  this  sort  uf 
logical  system  to  the  representation  of  data  [Codd:  1971].  Another  infor- 
mation algebra  was  proposed  by  R.  Bosak  [Bosak:  1962]. 

We  know  of  six  attempts  to  implement  on  a  computer  data  in  relational 
form:   ISG  [Smith:  1973],  MACAIMS  [Goldstein,  Strnad:  1971],  SEQUEL 
[Chamberlin:  1974],  COLARD  [Bracchi:  1972],  RIL  [Fehher:1972],  and  M.I.T.'s 
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RDMS.  Most  of  these  implementations  are  nearly  functionally  equivalent  to 
our  DDL  and  DML  of  the  next  level. 

The  only  practical  application  of  a  relational  system  (that  we  know  of) 
is  to  an  energy  information  system  for  aiding  public  policy  decisions  in 
New  England  [Donovan  &  Jacoby:  1974].  We  have  further  reported  in  that 
paper  an  extension  of  these  concepts  to  include  protection  and  validation 
mechanisms. 

Sloan's  contributions  to  date  lie  in  the  area  of  the  first  application, 
ex-tensions  of  concepts  to  handle  protection  validation,  additional  opera- 
tors, hierarchical  implementation,  performance  considerations,  and  a  Data 
Manipulation  and  Data  Definition  Language. 

4.2  Relational  Model 

A  user  at  this  level  views  data  as  relations.  A  relation  in  its  simplest 
form  is  a  tabel  where  each  column  is  a  doma i n  and  each  row  is  an  entry.  Let 
us  formalize  this  concept  and  define  some  operators  on  relations. 

The  cartesian  product  of  two  sets,  A  and  B,  is  denoted  by  A  x  B,  and 

is  defined  by: 

A  X  B  =  i (a,b):      a  e    A   ,   b  e    B 


by: 


The  expanded  cartesian  product,   X,  of  n  sets  B^ ,   B2,...,  B^  is  defined 
S(B^,B2,...,B^)   =    I  (b^,b2,...,b^):    b^   e  B.   for  j   =   l,2,...,nl  . 


The  elements  of  such  a  set  are  called  n-tuples,  or  just  tuples   .     When 
n  =  1,   X(Bi)  =  B,   since  no  distinction   is  made  between  a  1-tuple  and   its 
only  component. 
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Suppose  b  =  (b^,  b^.-.-b^)  and  c  =  (c^ ,  C2,...,  c^).  The  concatenation 
of  b  with  c  is  the  (m  +  n)- tuple  defined  by 

b||c  =  (b^,  b2,...b^,  c^,  C2,...e^). 

R  is  a  relation  on  the  sets  (B^  Bp,...,  B  )  if  it  is  a  subset  of 
X(Bi,  B^.-.-.B^).  A  relation  is  accordingly  a  special  kind  of  set.  Its 
members  are  all  n-tuples  where  n  is  a  constant  called  the  degree  of  the 
relation.  Relations  of  degree  1  are  called  unary,  degree  2  binary,  degree  3 

ternary,  degree  n  n-ary.  The  sets  on  which  a  relation  is  defined  are  called 
its  underlying  domains.  For  data  base  purposes,  we  (unlike  the  earlier 
work  of  Codd  and  others)  are  concerned  with  data  consisting  of  many  types, 
e.g.,  integers,  characters,  floating  point,  pointers,  binary,  boolean,  etc. 

Note  the  elements  (n-tuples)  of  a  relation  have  no  implied  order, 
thus  insertion  or  deletion  operators  are  simplified. 

4.3  Definition  of  Operators 

If  data  is  represented  in  this  relational  model,  what  are  the  appro- 
priate operators? 
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An   operators  create  a  new  relation  either  from  a  single  relation  or 
from  two  relations.     Using  the  mnemonic  "diadic"   to  refer  to  operators 
that  operate  on  two  relations  and  "monadic"   for  opei^ators  that  operate  on 
one,  we  may  divide  the  logical  operators   into  four  categories: 


Category 

Operator 

Symbol 

Type   1 

Traditional  set  operators 
as  applied  to  relations 

Union 

Intersection 
Cartesian  Product 

U 

N 
X 

Diadic 
Diadic 
Diadic 

Relational  operators 

appearing  in  literature 

Projection 

Join 

Composition 

Permutation 

Restriction 

P 

■A- 

M 
R 

Monadic 

Diadic 

Diadic 

Monadic 

Both 

Implementation  operators 

Inversion 
Ordering 

I 

Monadic 
Monadic 
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Notation:  R.   is  the  name  of  the  ith  relation 

c-  =  cardinality  (#  members)  of  the  ith  relation 

n.  =  degree  of  ith  relation 

d..  =  jth  domain  of  R.,;   j=l,...n. 

v^(d..)  =  mth  value  of  d..    ;  m=l,...c^. 

t.  =  n. -tuple  in  relation  R. 
11  1 

i.e.:     t.  .  (vjd,,),  v^(d,2),...v„(d.„.)) 


L(a)  =  length  of  1 ist 

V(x  =  "for  all   values  of"  a 


a  e   {1 , . .  .c. } 


0  =  null   set  (i.e.:     R^   =  0  =->  c^   =  0} 

a  ?,    3  "=  a  is  a  subset  of  6 

a  C    6  =  a  is  a  proper  subset  of  3     (i .c. :   a  ^  B) 
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The  following  examples  will  be  used  throughout  to  explain  definitions: 


Rl  = 


R2 


R3 


R4 


RIO  = 


(name 

soc_set_# 

,       213-97-1666 

621-49-2990 

,       413-00-0029 

,       839-41-6942 

phone 

deptJ) 

(SMITH 
(MACAVOY 
(JACOBY 
(SMITH 

,       232-1500       , 

356-5175 
,       484-7352       , 
,       253-0410       , 

15) 

15) 

6) 

6) 

(name 

,         soc_sec_^ 

,           phone 

deptj) 

(MADNICK 

(SMITH 

(MACAVOY 

,       217-61-7232 

,       213-07-1666 

621-49-2990 

,       253-6571 
,       232-1500 
,       356-5175 

* 
» 

i 

15) 

15) 
15) 

(person 

,            age 

city) 

(MADNICK 
(MACAVOY 
(SMITH 

31 
34 
23 

PEABODY) 
IPSWICH) 
BOSTON) 

,           city     ,   street 

(name 

,           phone) 

(SMITH 
(MACAVOY 

.       232-1500) 
.       356-5175) 

(person 

age 

#) 

(MADNICK 
(MACAVOY 

31 

34 

PEABODY   ,             1 
IPSWICH   ,             4 

8) 
3) 
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Example:  R6  =  Rl  N  R2;  results  in: 

R6  =   (SMITH  ,  213-07-1666   ,  232-1500  ,  15) 
(MACAVOY,  621-49  2990  ,  356-5175  ,  15) 


Cartesian  Product.  Suppose  you  wanted  to  form  a  new  relation  from 
two  relations  where  each  element  of  the  new  relation  consisted  of  every 
possible  parsing  of  the  elements  of  the  existing  relations.  The  car- 
tesian product  'X'  would  perform  this  task. 


R.  =  R.  X  R,    (j  =  k  is  valid), 

I     J     K 


Note:  if  n.  >  1  or  n.  >  1 ,  then  each  t.  (or  t.)  msut  be  treated 

J  K  J       K 

as  a  single  domain  so  that  effectively  n.  =  n.  =  1. 

J     K 


1  .e. 


n. 
1 


n.  +  n^ 


c  =  c .  *  c. 
1    J    k 

•^i  "  ^^\^^jl^'  ^B^^kl^^'  a  =  l,...c,^  ;  (f.  =  ],...c.} 

R.  =  {ordered  pairs  with  first  member  from  R.,  and 
second  member  from  R.  }  ^ 


Example:  R5  =  R4  x  R4  ;  results  in: 


R5  = 


((SMITH  , 

((SMITH  , 

((mCAVOY  , 

((MACAVOY  , 


232-1500), (SMITH 
232-1 500), (MACAVOY  , 
356-5175), (SMITH 
356-5175), (MACAVOY  , 


232-1500)) 
356-5175)) 
232-1500)) 
356-5175)) 
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4.3.1   Set  Operators  as  Applied  to  Relations 

Union:   Suppose  you  would  like  to  create  a  new  relation  that  consisted 
of  all  the  elements  of  two  other  relations  without  redundant  entries.  The 
union  of  two  relations  would  perform  this  task.  Using  the  symbol  U  we  may 
formally  define  union  as 

R 

c 


R.  U  R. ;  (j  =  k  is  valid), 


~  ^j   ^k  ~  (R.NR.)  i.e.,  duplicates  deleted  automatically 
J  K 

=  max{n.,  u^) 

=   {t.  :  t.  e  R.  OR  t.  e   R  > 

1        I        J  I        A 


Example:   R  =  R1  U  R2;  results  in 


R5 


(SMITH      , 

213-07-1666     , 

,    232-1500   , 

15)  : 

(MACAVOY, 

621-49-2990      , 

356-5175    , 

15) 

(JACOBY    , 

413-00-0029      , 

484-7352    , 

6) 

(SMITH      , 

839-41-6942      , 

253-0410   , 

6) 

(MADNICK, 

217-61-7232      , 

253-6671    , 

15) 

^R5  '  ^'  "R5 


Intersection:  Suppose  you  wish  to  create  a  new  relation  whose 

elements  consisted  of  only  the  cotrmon  elelemnts  of  two  other  relations. 

The  interseciton  of  those  two  relations  would  perfomr  this  task.  Using 

the  symbol  N  we  may  formally  define  the  intersection  operator. 

R^.  =  Rj  N  R^;  (i  =  j  =  k  is  valid) 

(note  that  if  n.  f   n,  ,  then  R.  =  0) 
J    k       ^  ' 

R.  =  {t.  :  t.  ^  t. }  Duplicate  t.  are  removed 

I        I      ^1      K  I 


n.  =  n.  =  n . 
1    k    J 
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4_3^?  Relational  Operators  [Codd:  1972] 

Projection.  Suppose  you  wish  to  create  a  new  relation  that  consisted 
of  only  some  of  the  domains  of  an  existing  relation.  Projection  could  do 
this  task.  (Projection  has  its  equivalence  in  prepositional  calculus, 
the  existential  qualifier  [Church:  1943].  Formally,  the  projecttion  'P' 
is  defined 

R.  =R.  P  (d.^),  ;i={l,2,...n.} 

n.  =  lU) 

c.  =  c.  (Note:  redundant  entries  not  automatically  removed  --  use 
^    J       the  "compaction"  operator  for  this  purpose) 

R.  =  (d.^  :  £-{l,2,...n.}} 

Example:  k5  =  R2  P  (name,  pnone);  results  in: 

(MADNICK      ,  253-6671) 

(SMITH       ,  232-1500) 

(MACAVOY      ,  356-5175) 

Join.  Suppose  you  wished  to  create  a  new  relation  from  two  existing 
relations  such  that  each  element  of  the  new  relation  was  the  concatenation 
of  elements  of  the  existing  ones.  Further,  you  only  wanted  to  concatenate 
those  elements  whose  domains  had  certain  properties.  The  joining  of  these 
two  relations  would  do  it.  Formally,  the  join  '*'  is  defined  (we  find 
the  following  definition  more  natural  than  Codd's): 

R,.  -Rj(d.,)*R,((0,d,J); 

0::=  >  I  <  I  =  I  ^0 

Z  e   {1,2,. ..n^} 

m  e  {1,2,. ..n^] 

^"^  ^^i   and  d,  must  be  of  the  same  data  type  (i.e., 

must  be  joinable). 

*Note;  We  have  somewhat  changed  some  of  the  definitions  for  implementation 
and  use  reasons.  E.g.,  we  never  automatically  eliminate  duplicate 
rows  in  Projection;  we  do  elimindtt  uiie  of  the  duplicate  columns 
in  Join. 
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31 

PEABODY), 

23 

BOSTON), 

34 

IPSWICH)} 

n.  =  n.  +  n,  -  1  (no  duplication  of  join  domain) 

1  J  K 


Y  =  1 ,. .  .c  .    ;  6  =  1 ,. .  .C|^} 

Example   (1):       R6  =  R2   (name)  *  R3  (=,   person);   results  in: 

(soc_sec_#  phone        dept_#       name  age  city) 

R6  =   {(217-61-7232         253-6671  15         MADNICK 

(213-07-1666         232-1500         15         SMITH 

(621-49-2990         356-5175         15         MACAVOY 

Example   (2):       R6  =  R3   (city)  *  R4(>,   name);   results  in: 

(soc_sec_#        age  city  phone) 

R6  =   {(r^DNICK  31  PEABODY  617-1400)} 

(Note  that  conceptually  this  example  does  not  make  sense;   it  simply 
illustrates  the  use  of  *  when  0  is  not  "=".) 

Composition.     Suppose  you  wanted  to  do  a  join  operator  but  you  did 
not  want  the  duplicate  column  to  appear  at  all.    ^ou  would  perform  a  composi- 
tion.    Composition   '.'    is  formally  defined  as: 

"i  =  "j   ("jj'   •   "k  ("km' 
I  z   {l,...n.} 

m  c  {1 ,. .  .r\^) 

d.j  &  d,  must  be  joinable  (of  the  same  data  type). 
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n.   =  n.  +  n,    -  2 


R.   =   [R.(d.J  *  R,  (d,    )]  P   (d.J 


3  =  l,...d 


J(Ji-l)'  ''j(P^+l)'---"j 


(i.e.,   remove  domain  d.     on  which 
R.  and  R,    were  joined^ 

Example:     R5  =  R2   (name)    •   R3   (person)    ;  results  in: 

(soc_sec_#  phone        dept_#  age  city) 

R5  =   {(217-61-7232  253-6671  15  31  PEABODY) 

(213-07-1666  232-1500         15  23  BOSTON), 

(621-49-2990  356-5175         15  34  IPSWICH)} 

Permutation.     Suppose  you  wish  to  interchange  some  domains  of  a 
relation.     Permutation   'M'    performs   this  task  and  is  formally  defined: 

ordered 
Ri  =  Rj  M  (dj^)   ;     a    S      {l,...nj}   (£)  =  n^. 

n.  =  n.   ;  c.   =  c.   ; 

The  only  effect  of  this  operator  is  to  reorder  the  domains  in  a 
relation. 

Example:     R5  =  R3  M  (person,  city,  age);   results  in: 

(name  city  age) 

R5  =  {(MADNICK  PEABODY  31), 

(MACAVOY  IPSWICH  34), 

(SMITH  BOSTON  23)} 
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Diadic  Restriction.     Perhaps  the  most  common  task  you  might  perform 
would  be  to  ask  for  all    the  elements   in  a   relation  A  that  have  anything 
to  do  with  the  elements  of  a  relation  B.     This  is  a  restriction  of  A  by  B 
and   is  formally  defined     using  the  symbol    |   as: 

R.   =  R.    (d.£)    I    R,     ((0   ,   d.    ));   £  ^   {l,...n.} 
1         JJ  k^^m       km"  '         j 

m  i  {1 ,. .  .n,  } 

where:    L(Ji)   =   L(m) 

and  n,        n.  <^  n . 

then:   n.   =  n. 
1         J 

0^  : :  =  >   I  <    I   =   I     .0 
R.   =     {tj   :   V   (d.^)  0^  v^(d^^)    ;   Z,  m  =   1,...L(£);  a  =  1 , . .  .c^.    ;   3  =  1 , . .  • } 

Example   (1): 

R6  =  R2   (name   ,   phone)    |   R4   ((=,   name)    ,    (=,   phone));   results   in: 

R6  =   {(SMITH  213-07-1666           232-1500           15), 

(MACAVOY  621-49-2990           356-5175           15)} 
Example   (2): 

R6  =  R2   (phone)  |   R4   ((>,   phone));   results  in: 

R6  -   {(MADNICK         217-61-7232         253-6671  15, 

(MACAVOY         621-49-2990         356-5175  15} 

(Note:     t,   of  R6  appears  because  253-6671   >  232-1500.     The  fact 
that  253-6671    <   356-5175  does   not  affect  this.) 
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Monadic  Restriction.  Suppose  you  wish  to  have  all  the  elements  of 
a  single  relation  that  conformed  to  a  condition  on  a  domain.  The  monadic 
restriction  of  that  relation  would  do  that  and  is  formally  defined  as: 


R.  R.  (d.J 


L(£)  =  L(m) 

0  ::  =  >  I 
m       ' 


(O  ,d.  )); 
^  m  jm 


£  ^  {l,,..n.} 


0 


m 


n.  =  n. 
1    J 

R.  =  {t  :  V  (d.J  0  v„(d.  )  ;  £,m  =  1,, 
1    j    a  jii   m  B  jm' 


.l{l);   a,  3  =  l,...c.} 


Example:  R6  -  RIO  (age)  {{<   ,  street_#));  results  in: 
R6  =  {(MACAVOY,  34,  IPSWICH,  43)}. 


4.3.2.1   Example  Using  Operators 

Let  us  take  an  example  demonstrating  the  use  of  this  view  and  a 

series  of  these  operators  to  satisfy  a  query.  Referring  to  Figure  0.3, 

we  may  construct  a  possible  representation  of  this  data  by  the  following 

five  realtions: 

TERMINAL  (TERNAME,  CITY,  STATE,  REGION) 
INVENTORY  (TERNAME,  FUELTYPE,  INVEN,  CONFIDEN) 
OWNER  (OWNERNAME,  ADDRESS,  PHONE) 
TERMINALOWNER  (TERNAME,  OWNERNAME) 
SUPPLIERS  (SURNAME,  TERNAME,  DISTRIBUTOR...) 
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Note  that  if  we  had  included  inventory  information  in  the  terminal 
relation,  there  would  have  been  many  empty  fields  since  all  fuel  types 
are  not  present  at  most  terminals. 

Let  us  now  use  the  basic  operators  of  diatic  restriction  and  pro- 
jection on  the  above  relations  to  extract  the  information  needed  to  answer 
the  following  question. 

"WHAT  IS  TELEPHONE  NUMBER  FOR  OWNER  OF  GULF  NO.  48  TERMINAL?" 

The  following  expression  when  evaluted  gives  the  result: 
J  OWNER  j.  TERMINALOWNER  I  'gulf  no  48  [=,  TER  NAME]  '^   P(OWNERNAME)  P(phone) 

Starting  with  the  innermost  parenthesis,  the  expression  is  evaluated. 
By  restricting  the  TERMINALOWNER  relation  with  the  instance  "gulf  no.  48", 
the  resulting  relation  is  then  projected  on  the  OWNERNAME  domain,  which 
results  in  a  realtion  of  all  desired  owners.  This  relation  is  used  to 
restrict  the  owner  relation  to  produce  a  new  relation  containing  all 
information  about  the  desired  owners.  This  result  is  then  projected  on  the 
phone  domain  to  get  the  desired  phone  numbers. 

Note  that  we  never  were  concerned  with  the  physical  storage  of  this 
data,  nor  were  we  concerned  about  the  number  of  entries  of  the  results. 
The  operators  do  not  depend  on  the  number  of  data  elements,  their  order, 
or  their  type. 

4.3.3  Implementation  Operators 

These  are  two  operators  that  we  have  implemented  in  the  system.  We 

call  them  implementation  operators  as  they  are  not  necessary  in  the 

theoretical  relational  calculus,  but  when  used  effectively,  they  can 

greatly  assist  performance  of  the  practical  computer  implementation. 
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liivorsion  Opera tx)r: 

Often  a  relation  will  be  accessed  by  queries  on  specific  domains 
or  a  combination  of  domains.  The  data  base  management  system  would  then 
maintain  additional  information  --  conceptually  indices  for  the  specified 
domains  --  to  facilitate  efficient  handling  of  frequent  requests.  These 
domains  are  specified  by  the  user  as  'keys'  or  inversions  and  are  main- 
tained by  the  system. 

We  call  this  an  implementation  operation  in  that  in  the  relational 
calculus  we  may  access  any  domain,  and  similarly  any  implementation  of  this 
model  could  access  any  element  by  specifying  the  values  of  domains.  This 
general  access  could  only  be  accomplished  by  a  linear  search  of  each 
element,  as  elements  are  not  ordered.  However,  an  inverted  domain  could 
employ  more  efficient  search  techniques. 

Ordering  Operator: 

Another  implementation  operator  that  we  are  implementing  (see  future 
memo  of  Grant  Smith)  is  the  ordering  operator.  As  the  relational  mathe- 
matics do  not  use  the  inversion  operator  the  mathematics  assumes  all 
entries  in  a  relation  are  unordered.  However,  often  for  reports  or  for 
selection  it  is  desirable  to  order  either  a  relation's  entries  or  its 
domains. 

4.3.4.  New  Operators 

We  find  the  followino  two  operators  to  be  helpful  in  operating  on 
data  at  the  relational  level.  These  operators  were  defined  and  imple- 
mented in  the  Sloan  system  [Smith:  1974]. 

Difference.  Suppose  you  wished  the  inverse  of  restriction.  That 
is,  suppose  you  wished  to  obtain  all  the  elements  of  a  relation  A  that 
have  nothing  to  do  with  the  elements  of  a  realtion  C.  The  difference  of 
A  and  B  (A-B)  gives  a  relation  compos(?d  of  such  oloiiK^nts.   Foriiially, 
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R.  =  R.  -  R,  ; 
1    J    k' 

(Note  that  if  n.  f   n,  then:  n.  =  max  (n.,  n,  ), 

J  K  1  J  1^ 

R.    =   R.    ) 
1  J- 


n .   =  n .  =  n, 

S-   =  S-  "  ^(Rj   N  R^) 

R.  =   {t.    :   t.  e  Rj.  and  t.  i  R^} 


Example:     R6  =  R1   -  R2;   results  in:   " 
R6  =   {(GRANGER,   413-00-0029,   536-5176,    6), 
(SMITH,       839-41-6942,   253-0410,   6)} 

Compaction.        If  you  took  the  projection  of  a  relation  it  is  possible 
for  you  to  get  a  new  relation  with  multiple  entries  of  the  same  element. 
If  you  then  wished  to  eliminate  these  duplicate  elements  you  could  perform 
a  compaction.     Formally,   a  compaction   'C    is  defined: 

R^-   =  C   (Rj);    (i   =  j   is  valid) 
"i   =  "j 


\.   =  [x..    :    t.   f  t       1/    /  i|    [OR:     R.   =  R.   N  R.] 
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5.  Level  5:  Security  Validation  and  Performance 

By  using  relations  about  relations  this  level  extends  the  relational 
model  to  provide  security,  validation,  and  performance  information.  That 
is,  all  relations  in  the  system  have  associated  with  them  protection, 
validation  and  perfomrance  information  that  is  kept  in  another  relation, 
or  all  requests  this  level  checks  (by  use  of  relation  operators  of  the 
level  below  acting  on  the  associated  protection  relation)  the  access 
rights  of  all  relations  before  any  access  is  made.  It  also  examines  the 
validation  information  associated  with  a  relation  before  any  inserts  are 
made  into  that  relation.  Lastly,  it  updates  perfomrance  information 
associated  with  all  relations  before  any  access  is  made  to  them. 

In  our  present  implementation,  access  control  is  applied  to  two 
general  aspects  of  the  system: 

1)  The  structure  of  the  system 

2)  The  contents  (data  of  the  system). 

The  types  of  access  that  can  be  specified  for  system  structure 
control  are: 

a)  read  --  user  may  see  system  descriptors 

« 

b)  delete  --  user  may  delete  parts  of  the  system 

c)  modify  —  may  change  existing  structure,  but  not  delete  it 

d)  insert  —  user  may  define  new  relations,  but  not  alter 
existing  ones 

■  e)  owner  --  user  created  relation,  and  so  can  do  anything 
v.'ith  it,  including  giving  other  people  access  rights  to 
it,  or  denying  himself  certain  rights 
f)  trap  --  this  invokes  a  monitoring  program  to  oversee  any 
actions  the  user  may  perform 
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The  facility  exists  within  the  current  implementation  of  this  level 
add  an 

to  /  additional  12  controls,  without  making  changes  to  existing  struc- 
tures. These  controls  may  be  applied  to  relations  as  a  whole  or  as 
individual  domains. 

6.  Level  6:_  DDL/DML 
level 
This  /  presents  to  the  user  a  Data  Definition  and  a  Data  Manipulation 

Language.  A  Data  Definition  Language  (DDL)  allows  a  user  to  specify  the 
structure  and  form  of  the  data  base.  The  DDL  will  accept  this  specifica- 
tion and  will  produce  an  appropriate  relational  data  base  system.  The  DDL 
also  provides  a  facility  for  loading  bulk  data  into  the  newly  constructed 
relational  system.  Such  loading  of  bulk  data  would  be  either  from  punched 
cards,  from  magnetic  tapes,  or  from  a  computer  magnetic  disk  file. 

The  Data  Manipulation  Language  (DML)  is  a  language  that  allows  a 
user  to  query  any  data  series  stored  in  a  relational  system.  An  internal 
M.I.T.  document,  "The  Internal  Intermediate  Language"  dated  April  29,  1974 
describes  a  complete  DDL  and  DML  that  have  been  specified  at  M.I.T..  This 
work  is  further  evolving  out  of  research  by  Smith,  Madnick,  and  Donovan. 
That  document  is  an  updated  version  of  a  DDL  and  DML  specified  in  November, 
1974  (Smith).  For  the  NEEMIS  facility  in  1975  we  will  deliver  a  working  and 
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debugged  subset  of  the  DDL  and  DML  described  in  the  referenced  document. 
The  following  section  describes  that  subset. 

6.1   An  Example  of  Using  DDL 

With  most  information  management  systems,  the  design  of  the  system  -- 
that  is,  the  design  of  the  data  base  --  is  a  vital  step  in  the  operation. 
If  done  incorrectly,  it  is  often  impossible,  and  usually  extremely  costly 
(in  dollars  and  man  years)  to  restructure  the  data  base  to  more  ably  suit 
the  needs. 

Not  so  with  NEEMIS  (the  New  England  Energy  Management  Information 
System)  [Donovan  &  Jacoby:  1974]. 


A  sample  session  of  the  DDL  might  be: 

system:         ENTER  COMMAND: 

user:   •       define  domains 

name  character,  soc_sec_if  numeric  9, 
zip  character,  age  num  3,  address  char; 

system:         ENTER  COMMAND: 

user:   ■       create  relation 

employee  (name,  soc_scc  _//,  ago)  (primary  key  1), 
employee_data  (naiiio,  address,  zip)  (pk  1,  required  2); 

system:  RELATIONS  DEFINED 

ENTER  COMMAND 

user:  stop. 
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This  session  would  establish  the  two  relations  and  permit  data  to  be  entered 
immediately. 

Given  this  simplicity  and  flexibility,  redefining  the  data  base 
ceases  to  be  a  major  task. 

Facilities  of  the  Definition  Sublanguage 

The  user  first  defines  the  domains  he  will  use  in  some  relation,  as 
well  as  the  type  of  data  that  will  appear  in  that  domain.  (The  digit 
following  "numeric"  specifies  the  maximum  number  of  digits  that  can  appear 
in  a  value  of  the  domain.)  The  system  makes  use  of  the  data  type  informa- 
tion when  the  user  enters  data  --  it  can  automatically  check  to  see  that 
the  value  being  entered  matches  what  the  domain  expects  (e.g.,  19B  is  not 
a  val  id  number) . 

Once  the  domains  are  defined,  the  user  defines  the  relations.  Defini- 
tion of  a  relatior,  cor.sists  of: 

a)  a  name  for  the  relation 

b)  the  domains  of  that  relation 

c)  options. 

The  options  are: 

1)  Primary  key  (or  "pk"),  which  specifies  which  domains  are 

to  be  used  as  the  primary  key 
2)  Required  (or  "req")  tells  the  system  that  certain  domains 

must  have  values  when  the  data  is  entered,  otherwise  it 

will  not  be  accepted  by  the  system.   (Primary  key  domains 

are  required  by  default).  For  example,  in  "employee_data" 

the  system  will  not  accept  data  unless  it  includes  a  name, 
and  an  address. 
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6-2   Syntax  Specification  of  DDL 

(using  BNF,  see  Donovan,  SYSTEMS  PROGRAMMING) 
The  following  table  defines/all  the  commands  and  options  of  the  DDL 

v/hich  will  be  implemented  by  November,  1975.  For  further  details  and  ex- 
amples of  this  subset  of  DDL,  refer  to    Section  2  of  the  previously 
referenced  document  by  Smith. 

In  the  table,  the  *  indicates  that  the  feature  in  question  is  imple- 
mented in  the  prototype  NEEMIS  system. 


Definition  Statements 
*DEFINE  DOMAIN[S] 

data  specifiable  for  domains: 

1)  *  (domain  namo>  ::  =  any  character  string  <  40  characters 

2)  <data  type>  ::=  CHAR[ACTER]*- |  NUM[ERIC]  <range>  * 


BIT 


/«si 
(kin 


ze>) 
fo>) 


FLGATCING]    ,  . 

INT[EGER]     <^^^"9e> 


CHOICE     (<value>,...) 


INTERNAL  | 

2.1)  <range>  ::   =  <lower>,  <upper>]<upper>* 

2.2)  <size>    ::   =^posit1ve  integer> 

3)  IN  (^relation  name> 

4)  P[ERMIT]  <TYPE>  ACCESS   [TO](,S^^^''  ''^'^^^         x 

-J  \    /      L  -"I  (<'usc.'r  name>,...) 

<type>::  =  <^access>l  (<'access>, . . . ) 
<access>  ::  =  any  of  19  modes 

Note  1:   (3)  is  mutually  exclusive  with  (2) 

Note  2:  for  BIT  data,  the  «'info>)  is  used  to  name  bits  in  the  field  -■ 
e.g.,  DEFINE  DOMAIN  ACCESS_  IJORD  BIT 

(READ,  WRITE...);  READ  is  bit  1,  etc. 
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*DEFINE  RELATION 

1)  *<;namG>(i:rdom£in>,.. .  )[(<opticns»] 

1.1)  V"ianiG>  :  :   =  any  character  string  <  40  characters 

1.2)  ("^domain  nanie>  ::    =  any  predefined  domain  name     existing  relation 

oK      ^name>.TID 

1.3)  <options>::    =  <roption  >,  <  options>|.i'option  > 
<option>     ::    =  ^PRKIARY  KEY  (^domain>[, . . .  ])       | 

■*REQ[UIRED](<domain>[,...])      | 
*INV[ERT]«domain>[,...]) 

2)  P[ERMIT]<type>  ACCESS  [TO]  ...same  as  DEFINE  DOMAIN 


^DROP 


|(<'relation>, . . . ) 

/domain   (<rdomain>,...)    IN  ^    ALL 
I   RELATION  <relation  name> 


DEFINE  INVERSTIOn[S]   IN  <rrelation>  (^domain  nanie>, . . . ) 


SET 


r MONITOR 
DEBUG      rON 
ERRORMSG   )  OFF 


{ 


6.3  Implementation  of  the  DDL 

Again,  using  the  hierarchcal  approach,  this  level  is  simply  and  rela- 
tively straightwardly  implemented  using  the  operators  of  the  level  below. 
A  DROP  command  causes  an  entry  in  the  master  relation  to  be  eliminated. 
The  operators  of  the  relational  operator  level  are  used  to  find  this  entry 
in  the  master  relation  directory.  The  master  directory  is  a  relation  con- 
taining an  entry  for  each  relation  created  by  the  DDL.  The  domains  of  the 
master  relation  describe  the  domains  of  the  relations  created. 
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6.4.    The  Data  Manipulation  Language  (DML) 

This  language  allows  the  user  to  access  data  stored  in  a  relational 
system.  The  entire  DML  is  described  in  section  3  of  the  previously 
referenced  Smith  document. 


In  the  subset  of  DML  that  v/ill  bo  operational  by  November,  1975  we 
will  include  five  ^^ery   powerful  comniands. 
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i)  DISPLAY   which  retrieves  specified  data  and  prints  it  on  the 
desired  device  (e.g.,  the  console), 
ii)  DELETE    deletes  specified  entries  or  relations  from  the  data 
base 
iii)  GET  INTO  retrieves  specified  data  and  places  it  into  a  specified 
file.  This  command  provides  the  mechanisms  for  trans- 
ferring data  stored  in  the  relational  data  base  to  a 
modelling  system  such  as  TROLL. 
iv)  INSERT    places  data  into  a  specified  entry  into  the  data  base. 
v)  UPDATE    changes  data  in  the  data  base 

6.5 Example  of  Use  of  DIM   In  Accessing  NEEMIS 

The  following  are  sample  queries  against  a  data  base  which  contains 
the  tables: 

TERMINAL  (TERMINALID,  NAME,  CITY,  STATE,  ZIP  CODE,  AFFILIATIOII) 
SUPPLY  CAPACITY  (TERMINALID,  FUELTYPE,  FUELAMT,  DATA) 
SUPPLIER  (SUPPLIERNO,  NAME,  VOLUME,  FUELTYPE,  DISTNO) 
DISTRIBUTORS  (DISTNO,  NAME,  ADDRESS,  CITY,  STATE,  INVENTORY, 
FUELTYPE) 

Question  1 

DISPLAY  TERMINAL  (NAME,  ADDRESS,  CITY) 
FOR  STATE  =  'MASSACHUSETTS', 


This  question  causes  the  listing  of  the  name,  address  and  city  of  all 
terminals  in  the  state  of  Massachusetts. 
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Question  2 

DISPLAY  TERMINAL  (NAME,  CITY)  FOR  FUELAMT> 1000,  FUELTYPE  =  'aASOLINE', 

CITY  =  'LYNN',  DATE  =  'NOVyA* 

This  lists  the  name  and  address  of  all    ■:erminals   in  Lynn  v/hich  have  over 
1000  gallons  of  gasoline  on  hand   in  Novf.'mber,   1974. 

6.6 Complete  Syntax  Specification  of  DHL 

The  complete  syntax  of  those  commands  are  given  belov/.      Included  are 
all   options  and  arguments  that  will   be  available  as  of  November,   1975. 
The  *  next  to  an  item  indicates  that  this   facility  exists  in  the  present 
NEEMIS  protype. 


DISPLAY*  .  hA        ■      r         1^ 

DELETE  f<domain>[,...]* 

GET  INTO  <file  naine>«format  info»i    ^'"^ 


rrin)<domain>  =   '<value>' [,. .  .]* 
r-UK<   ALL  > 


rFROM<file  name>(<format  info>) 
Tin  <rrelation>*J  <'doniain>  -    '<valuc>' [, . .  .]* 
INSERT*    J  (  '<valuc>'.    ^aluq>',    •<value>' 

L<domain>=<'value>'    [,...]; 

f  EXCLUDE 

}    INCLUDE    ('<valuG> '[,...]) 

UPDATEJ<domain  namc>  TO      /  V'ncw  value>'    R.'^f'^^^      .,       ..... 

<  L  /<^doiiiain>  =    '<vcil>',... 

I  ENTRY   IN     relation  TO    f(<'domain>    -    '^^vlaue^-, , . . . ) 

I  ('<valuG>',...)  j^j-^ 

FOR\<domain>=    '<val>', 
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(The  f  INClSe  (•<valueV[,...])  is  primarily  for  BIT  data  -  such  as  access 
control  v;ords.  It  does  not  reset  any  other  bits  --  only  the  one(s) 
affected.) 

e.g.:  UPDATE  ACCESSJJORD  to  XCLUDE  {'READ')  FOR  USERJAME  = 
'JONES',  DOMAIN  =  'SALARY'; 


ALL  commands  are  terminated  by  a  semi -co 


Ion  -  •;'.) 


(NOTE:  The  syntax  is  continually  being  revised  and  the  above  represents 
the  DDL,  DML  as  of  November  1975.) 
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6.7  Implementation  of  DML 

The  DML  consists  of  a  parser  that  recognizes  the  key  words  and  trans- 
lates the  request  into  the  appropriate  sequence  of  operation  of  the  operator 
level.  For  example,  the  two  DML  commands 

DISPLAY  TERMOWNER.OWNERNAME  FOR  TERNAME  =  GULF  48 

DISPLAY  OW^ER.PHONE  FOR  OWNERNAME  =  '  ^L  '    result  of  first  command. 


is  translated  into  the  oeprator  sequence  given  in  the  example  of  the  pre- 
vious level. 
Level  7: 

At  level  7  we  have  implemented  a  bulk  loader  that  allows  loading  of 
data  from  cards  into  a  relation.  We  have  written  vairous  graphic  query 
languages  and  facilities.  We  have  interface  the  system  to  virtually  any 
temrinal.  At  this  level  we  have  implemetned  an  enter  command  that  allows 
a  user  to  access  operators  at  lower  levels. 
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7.   Performance 

At  each  level  of  the  hierarchy  the  amount  of  time  to  execute  the 

programs  and  the  amount  of  storage  necessary  for  these  programs  is  of 

concern. 

At  the  time  of  writing  we  have  done  very  little  analysis  of  perfor- 

or 
mance  of  our  implementation/of  our  application  to  energy.      IBM  has  recently 

(effective  November  1974)   signed  a   two-year  joint  study  agreement  with 
M.I.T.'s  Sloan  School,  a  major  part  of  which  is   to  investigate  perfor- 
mance issues,  specifically,  a  comparison  and  analysis  of  Sloan  School's 
implementation  and  IBM's  implementation  of  a  relational   data  model   as  well 
as  Sloan's  application  to  energy.      Further,  ARPA  is  considering  a  proposal 
that    will   be  foramlly  submitted  in    1975  addressing  the  extensions 
of  our  approach  to  very  large  data  bases. 

Performance  is  an  important  issue  of  each  level   of  the  hierarchical 
model   of  data. 

Basically,   for  our  performance  work  we  employ  two  approaches: 

(1)  Analytical  approach 

(2)  Empirical  approach 

For  our  analytical  research  we  will  develop  models  of  each  level  of 
the  hierarchy  of  Figure  6.    For  our  empirical  work  at  each  level  we 
will  define  controlled  experiments  and  take  imperative  measurements. 
The  purpose  of  the  empirical  data  is  twofold: 
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1)  To  verify  our  analytic  models.  We  want  to  be  assured  that  we 

have  considered  all  influential  factors  in  our  formulation  of 

models, 
initial ize 

2)  To  /  paramaters  of  models.  Some  models  will  involve  para- 
meters that  must  be  initialized.  Values  of  these  parameters  will 
be  determined  empirically. 

Example: 

An  example  of  this  dual   approach  is  our  analysis  of  the  paging  per- 
formance within  Level    2. 

In  OPERATING  SYSTEMS  [Madnick  and  Donovan:    1974]  and  in  Madnick's 
Ph.D.    thesis   [Madnick:   1973]  we  built  a  model   of  paging  and  proved  that 
there  exists  cyclic  program  references  that  can  cause  page  fetch  fre- 
quency to  increase  significantly  if  the  page  size  used  is  decreased   (e.g., 
reduced  by  half).     Furthermore,   the  proof  of  the  theorem  below  shows  that 
the  limit  to  this  increase  is  a   linear  function  of  primary  storage  size 
(the  more  memory,   the  worse  performance  can  be!!) 
THEOREM 

For  any  two  demand-fetch  LRU-removal    two-level   storage  systems,   S 
and  S',  with  page  sizes  N  and  N'=N/2  and  primary  store  sizes    |M^|   and 

|M^|'=2  M]     ,   respectively,   there  exists  a  cyclic  page  trace,   P  =   (Pc)*, 
where     Pc     =2(  M'  +1),   such  that  the  steady-state  page  fetch     frequency 
ratio,   /r/,   equals     M]  +1 . 
PROOF:     See  Madnick  [Madnick:   1973]. 
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Further,  to  minimize  the  bad  effect  of  this  anomaly  and  yet  gain  the 
potential  good  effects  of  reducing  page  size,  we  presented  an  algorithm  that 
Madnick  calls  tuple  coupling.  This  algorithm  can  be  added  to  any  existing 
paging  algorithm  and  limit  the  bad  effects  of  reducing  page  size. 


THEOREM: 

For  any  two  demand-fetch  two-level  storage  systems,  S  and  S',  with 
page  sizes  N  and  N'=N/2,  respectively,  the  use  of  the  "tuple-coupling" 
approach  for  S'  in  conjunction  with  a  removal  algorithm  that  is  "tuple- 
couple-able"  is  sufficient  to  guarantee  that  the  page  fetch  frequency 
ratio,  r,  cannot  exceed  the  value  2  for  all  possible  page  traces,  P. 


On  the  empirical  side,  two  people  independently  [Hatfield:  1972; 
Seligman:  1968]  have  performed  experiments  that  verified  that  the  theo- 
retical phenomenon  not  only  occurs  but  can  occur  frequently.  Hatfield 
performed  studies  in  the  hardware  environment  of  the  IBM  360  Model  67 
with  programs  running  under  CP-67/CMS.  Seligman  observed  the  same 
siutation  in  a  cache  system  with  much  smaller  page  sizes.  In  a  more 
recent  work  [Donovan  and  Madnick:  1975]  we  developed  a  probabal istic 
model  of  Level  1  for  analysis  of  data  security  and  privacy  of  this  level. 
Possible  Models  of  Level  3: 

Level  3  is  the  file  system  that  is  concerned  with  the  physical  repre- 
sentation of  data.  One  possible  physical  organization  of  a  series  of  data 
is  in  a  tree  as  in  Figure  3.1.   (For  example,  organize  one  level  of  nodes 
by  a  person's  last  name,  the  next  level  by  first  name).  Questions  that 
come  to  mind  immediately  are:  What  would  be  the  number  of  levels  and  nodes 
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to  minimize  the  number  of  accesses?  What  is  the  minimum  access  path  to 
a  particular  node?  What  are  the  best  searching  algorithms?  A  possib  le 
analytical  approach  to  the  second  question  may  be  taken  by  using  a  topo- 
graphical representation  of  the  tree  of  Figure  3.1  and  assigning  truth 
values  to  each  link  and  device  functions  to  minimize  the  paths  to  a  given 
node. 
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Figure  3.1 
Tree  Structure  as  a  Physical  Representation 

The  first  question  of  what  is  the  minimum  number  of  nodes  can  be 
addressed  using  analyses  similar  to  those  of  Severence's.  Another  way 
of  expressing  this  question  is  to  find  a  compromise  between  the  number  of 
comparisons  performed  in  the  branching  nodes  at  each  level  and  the  number 
of  levels  that  must  be  visited. 

C'7/J 

Using  an  approach  similar  to  Severence's,   let  z  be  the  branching 
factor  for  a  balanced  tree  containing  N  equally  referenced  leaves.     For 
each  retrieval    the  number  of  levels  visited  is: 
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L  =  log^  N 
and  the  average  number  of  comparisons  at  each  level  is 

C  =  (b+l)/2 

to  minimize  the  total  number  of  comparisons 

min  L  *  X  =  min  z+1  *  log  N 

2      ^ 
2  1      z  1 

re-expressing  the  log  of  the  base  z  to  a  base  e  and  finding  the  minimum  by 

taking  the  derivative  and  equating  it  to  0: 

2  =  3.6 
This  analysis  indicates  the  expected  branching  factor  vyould  be  3  or  4. 

The  questions  of  best  search  algorithms  have  been  analyzed  and 
addressed  in  literature  [Donovan:  1972;  Knuth:  1973;  (see  also  bibliogra- 
phies in  these  books].  Since  storage  is  becoming  so  inexpensive  and 
computer  time  critical,  the  trend  is  not  to  store  elements  in  trees  but 
rather  to  use  hash  searching  techniques  [Donovan:  1972].  This  technique 
is  accomplished  by  storing  data  in  memory  location  whose  value  is  com- 
puted by  some  function  of  the  key. 

Several  techniques  are  commonly  used  when  two  keys  results  in  the 
same  value.  One  is  a  chained  overflow  where  if  a  key  is  mapped  into  a 
location  previously  occupied,  a  new  location  is  found  and  a  pointer  is 
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placed  in  the  first  location  to  it.  Several  analytical  models  have  been 
developed  to  assist  in  determining  the  expected  number  of  accesses  re- 
quired to  retrieve  a  record.  They  all  follow  simialr  formulations  and 
result  in  the  same.  Using  chaining  and  a  bucket  size  of  conclusions  if 
a  bucket  size  of  1  is  used,  the  expected  number  is: 

1  +  P/2 
where  P  is  the  loading  factor.  That  is,  P  =  ^  where  N  equals  number  of 
records  and  M  is  the  size  of  ptiysical  storage  assuming  a  separate  overflow 
area. 

J  Analytical  models  in  this  area  are  based  on  probability  theory. 


8.  Further  Issues  and  Investigations  in  Hierarchies 

The  following  issues  are  being  or  need  to  be  investigated: 

-  Data  about  Data  --  a  quantitative  theory  for  dealing 
with  data  of  various  degrees  of  confidence.  Much  of  the 
data  we  have  been  dealing  with  is  energy, 

-  Adaptive  Restructuring  --  develop  a  theory  using  the  per- 
formance information  on  access  tables  to  restructure  data 
to  make  future  access  more  efficient. 

-  Decision  Rules  for  Efficient  Queries  --  at  the  relational 

level  develop  criteria  that  would  determine  under  what 

which  operator  should  be  involved,  e.g.,  decide  whether 
conditions/a  joint  or  a  simple  search  should  be  performed. 

-  Multiple  Access  --  Highly  active  multiple  users  impose  con- 
straints of  locking  shared  tables.  The  analysis  of  the 
implications  of  multiple  access  users  at  all  levels  will  take 
place. 
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The  following  unproven  theorems  are  proposed: 

Theorem  1 :  Each  element  in  the  higherarchy  has  a  well-defined  set 
of  operators,  tt.  .  where  i  is  the  level  and  j  differentiates  between  the 
operators. 

Theorem  2:   The  set  of  operators  at  each  level  is  equivalent  to 
aximatic  set  theory.   (Proof  would  formulate  all  the  set  theoretic 
operators  in  terms  of  the  operators  at  that  level.) 

Theorem  3:  If  the  operators  it.,  are  secure  at  a  level  below  i  <^  k, 
then  any  security  violation  must  be  at  level  n  where  k  <  n  _<  max  i. 
(Proof  would  define  security,  show  that  because  of  the  hierarchical 
implementation,  namely,  only  calls  inward  are  permitted.  It  is  impossi- 
ble to  have  security  violation  ripple  down  through  secure  levels. 

Theorem  4:  There  exists  a  procedure  for  insuring  system  reliability 
of  all  systems  implemented  in  a  hierarchical  fashion. 
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__9.  Further  Investigations  in  Relations 

Our  future  investigations  at  the  relational  level  would  include: 

-  Definition  of  new  operators  that  can  be  implemented  in  micro-code. 

-  Develop  algebraic  techniques  for  reducing  relational  expression 
into  some  minimal  set  of  operators. 

-  Formally  prove  the  consistency  and  completeness  of  these 
new  sets  of  operators. 

Algebraic  Techniques  for  Reduction: 

The  objective  here  is  to  find  other  operators  that  are  more  easily 
implemented,  to  show  their  equivalency,  to  further  develop  techniques 
for  taking  expressions  of  any  relational  operators,  to  reduce  those  expres- 
sions to  equivalent  expressions  using  only  efficient  operators,  and  finally, 
to  reduce  these  expressions  to  minimal  computational  expressions.  We 
propose  the  following  theorems  to  accomplish  the  above  objectives  with 
proofs  and  blanks  to  be  completed  in  the  ensuing  years. 


that 
Define  a  set  of  operators  (tti,  tt2,  7T3,  tt,,.  . .  ,TTj)/arc  easy  to  imple- 
ment on  existing  computers  and  have  good  perfomrance  characteristics. 
^^^^^^^   I-  The  operators  ttj  ,  tt2,  7T3,...,7rj  are  complete  (proof 
could  show  their  equivalence  to  operators  of  propositional  calculus  and 
hence  are  complete  by  Church's  thesis). 
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Definition:  Define  a  computational  unit  that  is  proportional  to 
the  length  of  time  it  takes  a  computer  to  execute  a  sequence  of  instructions, 

Theorem  2:  There  exists  a  weighting  function  f(7T. ).  that  has  a 
value  for  each  it.,  and  that  value  is  proportional  to  a  computational  unit 
related  to  the  performance  of  an  implementation. 

Theorem  3:  For  each  sequence  of  operators  T.  (where  T.  is  an  un- 
ordered  sequence  of  tt . )  on  a  set  of  relations.  There  exists  an  equivalent 
sequence  of  operators  P.  on  the  same  set  that  results  in  the  same  relation. 
Proof  would  show  an  algorithm  for  reducing  any  string  of  operators  (projec- 
tions, restrictions,  etc.)  to  an  equivalent  string  of  operators 

Theorem  4:  For  T.  and  P.  there  exists  the  function 


for.  all  Tj.   in  T.  for  all  it  in  P, 

Theorem  5:  It  is  always  possible  to  determine  the  sequence  k  of  P. 
that  is  a  minimum  of  the  function  of  Theorem  4. 
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Theoreras  1,  2,  3,  4,  and  5  give  formal  insight  into  the  question  of 
equivalency  of  operators  and  sequence  of  operators  that  are  equivalent 
and  yet  require  less  computation. 

Theorem  6:  If  a  relation  has  domains  D-j ,  D2,  D2,...D  and  if  D.  is 

some  function  of  D.,  then  the  following  restrictions  

make  the  updating  problem  soluble. 

Thoerem  7:  The  set  of  operators  7ri,...TT2  are  consistent. 
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20^  Comparison  of  Other  Views  of  Data 

The  major  attractiveness  of  the  heirarchical  model  is  that  the  model 
at  each  level  is  consistent,  and  simple  in  concept,  which  means  routines 
of  each  level  are  always  utilizing  identical  concepts,  irrespective  of 
the  actual  data.  This  also  allows  for  extremely  powerful  storage/retrieval 
commands  that  do  not  inherently  --  or  otherwise  --  contain  a  "path"  to  the 
data. 

The  hierarchical  model  affords  many  advantages  (in  fact,  it  incor- 
porates) over  conventioanl  information  system  design.  Within  the  context 

some  of 
of  the  energy  information  system,  this  section  will  cite/these  advantages 

and  compare  them  with  other  possible  views  of  the  data. 

1)  A  system  implementor  can  operate  at  .a  higher  level  than  before, 
i.e.,  the  data  base  need  only  be  defined  in  terms  of  its  relations 

'  and  the  operations  upon  those  relations.  This  is  in  contrast  to 
the  conventional  technique  of  first  designing  tlie  internal  file 
structure  of  the  system  followed  by  a  larne  set  of  routines  to 
manipulate  that  file  structure. 

2)  Additional  relations  and  additional  domains  within  existing  rela- 
tions may  be  created  after  the  initial  implementation  of  an  applica- 
tions program  without  the  need  without  the  need  to  reprogran  or 
reorganize  the  data  base.  For  example,  if  it  became  necessary  to 
start  maintaining  data  on  the  distance  of  all  fuel  terminals 

from  the  water,  and  there  was  no  previous  provision  for  this  do- 
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main  in  the  energy  infonnation  ssytcm,  a  new  domain  called  DISTANCE 
could  l)C  created  in  the  TEriMIiiALS  relation  without  disturbing  any 
of  the  existing  data,  and  all  fuLure  interrogations  and  manipu- 
lations on  tiiat  domain  v.oukl  generate  the  co-rect  answers  for 
policymakers. 

3)  Since  the  rows  of  relations  are  order-independent,  insertions  and 
deletions  to  the  data  base  can  be  handled  with  the  same  flexibility 
as  described  for  additions  of  relations  and  domains. 

/I)  The  use  of  system  generated  do;r,ain  inversions  provides  an 

efficient  and  powerful  retrieval  capability  that  is  much  faster 
than  a  tuple  by  tuple  linear-  search  through  the  data  base. 

5)  An  attractive  feature  of  the  data  spcurlt.y  level  is  the  fact  that 
access  control  and  integrity  control  are  independent  of  the  data 
structure,  and  so  can  be  modified  independently.  . 

Other  views  of  data  tend  to  fail  on  one  or  more  of  the  above  ponts. 

say  in  FORTRAN,  a  fixed 

If  the  energy  system  had  been  implemented  in  a  non-hierarchical  manner,/ 
set  of  arrays  would  have  to  be  defined  and  a  maximum  file  size  would  be 
indicated  at  system  generation  time.   If  at  any  time  in  the  future,  any 

new  field  or  doaiain  was  to  bs  added  to  the  data  base,  the  entire-  system  would 

redone 
have  to  be   /   to  handle  the  change.  This  means  that  the  old  compiled 

programs  would  have  to  be  discarded. 
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A  FORTRAN  file  would  typically  be  sequentially  ordered  by  some  numeric 
key,  such  as  terminal  number  for  the  terminals  file.  Any  extensive  inser- 
tions or  deletions  of  terminal  records  would  create  a  very  inefficient  data 
base  and  pe-iodic  manual  data  base  reloads  would  be  necessary.  CVen  the  use 
of  an  indexed  sequential  access  method  (ISAM)  would  cjcr.erate  cumbersome 
overflow  tcbles  and  pointers  after  many  insertions  and  deletions  to  the  data 
base,  which  could  only  be  eliminated  by  complete  reloading  of  the  data. 
Interrogations  would  have  to  be  satisfied  through  linear  searches  of  the 
files,  and  perfomrance  could  be  improved  only  through  the  implementation 

of  specialized  question-answering  routines.  New  types  of  queries  would 
require  more  progra-ming. 

If  the  system  was  to  be  developed  using  a  hierarchical  tree  view  of  th 
data,  other  problems  would  result.  MUMPS  [Barrett,  Marble,  Green,  PapiUardo 
of  Mass  General  Hospital,  Boston]  is  a  high-level  language  with  a  powerful 
tree  structure  data  base  facility.  A  MUMPS  version  of  an  energy  information 
system  would  have  good  update,  deletion,  ordering  and  insertion  capabilities, 
but  its  ability  to  relate  different  types  of  information  in  the  data  base  and 
to  answer  complex  queries  without  extensive  data  redundance  would  be  weak.  As 
an  example,  consider  the  following  hierarchy  relating  terminals  to  their  owners: 


t) 

I 

TERf-IINAL  A      TERMINAL  B      TERMINAL  C 


OWNER  X        OWNER  Y        OV-'NER  X 


-55- 


A  question  to  MUMPS  asking  for  the  owner  of  a  terminal  would  easily  bo 
answered  by  accessing  the  next  level  of  tree  structure.  However,  if  the 
question  "What  terminals  does  OWNER  X  own?"  was  asked,  then  the  only  way  the 
answer  could  be  found  using  the  above  structure  would  be  to  the  owner  of 
every  terminal,  and  then  collect  all  terminals  owned  by  OWNER  X.  To  improve 
performance,  inversions  are  handled  in  MUMPS  by  using  prelincd  bit  n:aps  of 
files,  but  that  precludes  extensive  dynamic  changes  to  the  file  structure. 
If  the  file  v/as  actually  inverted  like  this: 


OWNER  Y 


OWNER  X 

/     \ 


TtRI'lIN'AL  A  TEi:;iNAL  C    TC1;M1NAL  B 


then  the  data  v/ould  have  to  be  stored  twice,  and  any  changes  that  data 
would  have  to  be  made  twice  to  maintain  d^ta  integrity,  not  considering  the 
waste  of  storage  space. 


n.   Summary 

The  theoretical  structures  underlining  our  work  are  all  related 
through  the  hierarchical  concept.  The  relational  model  of  data  is  simply 
the  view  of  data  at  one  level,  a  level  above  all  physical  dependencies. 
Each  level  has  operations  and  performance  issues  that  must  be  studied. 
The  hierarchical  concept  itself  is  a  structure  that  can  be  further  studied 
and  exploited. 

The  major  thrust  of  the  future  research  will  be  to: 

-  Define  new  operations  at  all  levels 

-  Formally  prove  the  following  about  the  properties  of  operators 

at  each  level : 

-  Completeness 

-  Reduction  to  equivalent  sequences  of  operators 

-  Existence  of  analytical  methods  of  determining  minimal 

sequences 

-  Implement  the  following: 

-  A  modeling  interface  between  this  data  management  and  a 

facility  like  TROLL 

-  The  NEEMIS  application 

-  Graphic  interface 

-  The  complete  DDL/DML  --  security  mechanisms  and  new  operators 

(Levels  5,  6,  7,  and  8) 
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Formally  prove  the  following  characteristics  of  the  hierarchical 
approach: 

-  Under  what  conditions  will  there  exist  a  deterministic 

procedure  for  proving  correctness  and  integrity  of  any 
system  constructed  this  way 

-  The  existence  of  a  procedure  for  integrity  and  security  of 

a  hierarchical  system 
Develop  analytical  models  of  performance  at  all  levels 
Devise  empirical  performance  experiments  at  all  levels. 
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